The R Primer by Ekstrøm Claus Thorn

The R Primer by Ekstrøm Claus Thorn

Author:Ekstrøm, Claus Thorn [Ekstrøm, Claus Thorn]
Language: eng
Format: epub, pdf
Published: 2011-07-11T21:07:53+00:00


9.445

< 2e-16 ***

pcPC2

0.1592

0.5050

0.315 0.752540

pcPC3

-0.7191

0.3273

-2.197 0.028032 *

pcPC4

0.9151

0.3691

2.479 0.013159 *

---

Signif. codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 884.35

on 682

degrees of freedom

Residual deviance: 106.12

on 678

degrees of freedom

AIC: 116.12

Number of Fisher Scoring iterations: 8

We see from the logistic regression analysis that the first principal com-

ponent is highly significant while principal components 3 and 4 are

barely significant. Principal component 2 is not significant when the

three other principal components are part of the model, so even though

principal component 2 explains second-most of the variation (of the

variables) it has no effect on the responses.

See also: See Rule 3.31 for principal component analysis and Rule 3.8 for

logistic regression modelling. The pcr function from the pls package

can also be used for principal component regression.

160

The R Primer

3.33

Classify observations using linear discriminant

analysis

Problem: You want to find a linear combination of features which can

be used for classification of two or more classes.

Solution:

Discriminant analysis is a statistical technique for classifi-

cation of data into mutually exclusive groups. In linear discriminant

analysis we assume that the groups can be separated by a linear combi-

nation of features that describe the objects and with k groups we need

k − 1 discriminators to separate the classes.

The function lda from the MASS package can be used for linear dis-

crimination analysis. Input for the lda function is a model formula

of the form group ~ x1 + x2 + · · · where the response group is

a grouping factor and x1, x2, . . . are quantitative discriminators. The

prior option can be set to give the prior probabilities of class mem-

bership. If it is unspecified, the probabilities of class membership are

estimated from the dataset.

In the following code, we will make a model to classify breast cancer

type (“benign” or “malignant”) based on tumor clump thickness (V1),

uniformity of cell size (V2) and uniformity of cell shape (V3). The vari-

ables are found in the biopsy data frame found in the MASS package.

> library(MASS)

> data(biopsy)

> fit <- lda(class ~ V1 + V2 + V3, data=biopsy)

> fit

Call:

lda(class ~ V1 + V2 + V3, data = biopsy)

Prior probabilities of groups:

benign malignant

0.6552217 0.3447783

Group means:

V1

V2

V3

benign

2.956332 1.325328 1.443231

malignant 7.195021 6.572614 6.560166

Coefficients of linear discriminants:

LD1

V1 0.2321486

V2 0.2574805

V3 0.2500765

> plot(fit, col="lightgray")

The prior probability of the groups and the resulting linear discrimina-

Statistical analyses

161

1.2

0.6

0.0

−2

−1

0

1

2

3

4

group benign

1.2

0.6

0.0

−2

−1

0

1

2

3

4

group malignant

Figure 3.14: Example of lda output. Histograms for values of the linear dis-

criminator is shown for observations from both the “benign” and “malignant”

group.

tor are both seen in the output. Plotting the fitted model can be seen

in Figure 3.14 and the type of plot produced depends on the number

of discriminators. If there is only one discriminator or if the argument

dimen=1 is set, then a histogram is plotted; if there are two or more

discriminators, then a pairs plot is shown.

The predict function provides a list of predicted classes (the class

component) and posterior probabilities (the posterior component)

when the result from lda is supplied as input. These can be used to

evaluate the sensitivity and specificity of the classification.

> result <- table(biopsy$class, predict(fit)$class)

> result

benign malignant

benign

448

10

malignant

33

208

> sum(diag(result)) / sum(result)

[1] 0.9384835

We can see here that the linear discriminant analysis correctly classi-

fies 93.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.